Chinese-Japanese Clause Alignment
نویسندگان
چکیده
Bi-text alignment is useful to many Natural Language Processing tasks such as machine translation, bilingual lexicography and word sense disambiguation. This paper presents a Chinese-Japanese alignment at the level of clause. After describing some characteristics in Chinese-Japanese bilingual texts, we first investigate some statistical properties of Chinese-Japanese bilingual corpus, including the correlation test of text lengths between two languages and the distribution test of length ratio data. We then pay more attention to n-m(n>1 or m>1) alignment modes which are prone to mismatch. We propose a similarity measure based on Hanzi characters information for these kinds of alignment modes. By using dynamic programming, we combine statistical information and Hanzi character information to find the overall least cost in aligning. Experiments show our algorithm can achieve good alignment accuracy.
منابع مشابه
Japanese-Chinese Phrase Alignment Exploiting Shared Chinese Characters
Common Chinese characters between Japanese and Chinese have been proved to be effective in Japanese-Chinese phrase alignment. Besides common Chinese characters, Japanese and Chinese also share many other semantically equivalent Chinese characters. However, there are no available resources for this kind of Chinese characters. In this paper, we propose a statistical method aiming to detect these ...
متن کاملJapanese-Chinese Phrase Alignment Using Common Chinese Characters Information
We describe a method to detect common Chinese characters between Japanese and Chinese automatically by means of freely available resources and verify the effectiveness of the detecting method. We use a joint phrase alignment model on dependency trees and report results of experiments aimed at improving the alignment quality between Japanese and Chinese by incorporating the common Chinese charac...
متن کاملBursty Topics in Time Series Japanese / Chinese News Streams and their Cross-Lingual Alignment
This paper studies issues regarding topic modeling of information flow in multilingual news streams. If someone wants to find differences in the topics of Japanese news and Chinese news, it is usually necessary for him/her to carefully watch every article in Japanese and Chinese news streams at every moment. In such a situation, topic models such as LDA (Latent Dirichlet Allocation) and DTM (dy...
متن کاملAlignment and Word Order in Old Japanese Alignment and Word Order in Old Japanese Keywords Active Alignment @bullet Ergative Alignment @bullet Split Intransitivity @bullet Case @bullet Nominalization @bullet Verbal Prefixes @bullet Clitic Pronouns @bullet Nominal Hierarchy
This paper argues that Old Japanese (eighth century) had split alignment, with nominative-accusative alignment in main clauses and active alignment in nominalized clauses. The main arguments for active alignment in nominalized clause come from ga-marking of active subjects and the distribution oftwo verbal prefixes: /-for active predicates and safor inactive predicates (cf. Yanagida, In: Hasega...
متن کامل